fix(openai): 永久禁用缺失 refresh_token 且access_token过期的 OAuth 账号#2514
Open
is7Qin wants to merge 1 commit into
Open
Conversation
token_provider 在 expires_at 已过且 refresh_token 缺失时,仅返回 error,未做任何降级。 HandleUpstreamError 的 OAuth 401 分支也只走 10min 冷却,不区分账号是否具备刷新能力。 两条路径相加导致缺 refresh_token 的账号被反复选中、每次都在 token 阶段失败,对用户呈现持续 502。 token_provider.GetAccessToken: 命中"过期且无 refresh_token"时调用 SetError 永久禁用并清缓存, 依赖 background context 避免请求 ctx 提前结束影响落库。 ratelimit_service 401 OAuth 分支:refresh_token 为空时直接 SetError,不再写 expires_at、 不再 SetTempUnschedulable,缓存失效保留。RT 账号路径完全不动。 新增/调整测试覆盖两条路径,旧测试为 RT 路径补足 refresh_token 字段以保留原意图。
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Permanently disables OAuth accounts that hit 401 or token expiry without a refresh_token, since they cannot self-heal and would otherwise be repeatedly selected, causing recurring 502s.
Changes:
- In
RateLimitService.HandleUpstreamError, OAuth 401 with missing/blankrefresh_tokennow triggersSetErrorinstead of a 10-minute temp-unschedulable cooldown. - In
OpenAITokenProvider.GetAccessToken, expired access_token + missing refresh_token now calls a new helper that marks the account as errored and clears its cached token. - Added unit tests covering both new code paths.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| backend/internal/service/ratelimit_service.go | Short-circuit OAuth 401 path when refresh_token is missing/blank, calling handleAuthError. |
| backend/internal/service/ratelimit_service_401_test.go | Tests for missing and blank refresh_token cases; updates existing tests to include refresh_token. |
| backend/internal/service/openai_token_provider.go | Adds disableAccountMissingRefreshToken helper invoked when token expired & refresh_token absent. |
| backend/internal/service/openai_token_provider_test.go | Test asserting account is disabled exactly once when refresh_token missing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
GetAccessToken检测到 access_token 已过期且 refresh_token 缺失时,调用SetError永久禁用并清缓存;之前仅返回 error,账号在 DB 中保持
active,会被反复选中每次都在 token 阶段失败,对用户呈现持续502。
HandleUpstreamError的 OAuth 401 分支:当 refresh_token 缺失时直接SetError永久禁用,不再SetTempUnschedulable、不再改写 expires_at —— 缺 RT 的账号在 10分钟冷却期内无法被任何路径自愈,冷却结束只会再换来一次 502。
tokenCacheInvalidator.InvalidateToken)和 RT 账号的原有路径(改写 expires_at、10 分钟冷却、后台TokenRefreshService 拾取)完全不动。
Why
线上 account_id=2881 的 OAuth 账号 expires_at 已过,credentials 中无 refresh_token。token_provider 直接 return error
但不调度降级,handler 将该 error 当作普通失败而非
UpstreamFailoverError,既不切账号也不剔除当前账号,下一发请求大概率仍选中它,循环502。这次修复在两条独立入口同时把"缺 RT"识别为永久故障并落库到
status=error。Test plan
go build ./...go test -tags=unit ./internal/service/ -run "RateLimit|ErrorPolicy|OAuth401|TokenRefresher|TokenProvider|RefreshAPI|OAuthRefresh"TestOpenAITokenProvider_NoRefreshTokenExpired_DisablesAccount:覆盖 token_provider 兜底路径TestRateLimitService_HandleUpstreamError_OAuth401NoRefreshTokenSetsError:双子用例覆盖完全无 RT / RT为空白字符串
refresh_token字段后通过Notes
SetError而非SetTempUnschedulable:refresh_token 缺失不是临时故障,依赖时间无法自愈,永久禁用更准确,匹配ratelimit_service中token_invalidated/token_revoked的现有约定。SetError使用context.Background()落库,避免请求 ctx 提前结束影响降级效果。